Distributed storage services supported by a nic

ABSTRACT

Some embodiments provide a method of providing distributed storage services to a host computer from a network interface card (NIC) of the host computer. At the NIC, the method accesses a set of one or more external storages operating outside of the host computer through a shared port of the NIC that is not only used to access the set of external storages but also for forwarding packets not related to an external storage. In some embodiments, the method accesses the external storage set by using a network fabric storage driver that employs a network fabric storage protocol to access the external storage set. The method presents the external storage as a local storage of the host computer to a set of programs executing on the host computer. In some embodiments, the method presents the local storage by using a storage emulation layer on the NIC to create a local storage construct that presents the set of external storages as a local storage of the host computer.

BACKGROUND

In recent years, there has been an increase in the use of hardwareoffload units to assist functions performed by programs executing onhost computers. Examples of such hardware offload units include FGPAs,GPUs, smart NICs, etc. Such hardware offload units have improvedperformance and efficiency requirements of the host computers byoffloading some of the operations that are typically performed by thehost computer CPU to the hardware offload unit.

BRIEF SUMMARY

Some embodiments of the invention provide a method of providingdistributed storage services to a host computer from a network interfacecard (NIC) of the host computer. At the NIC, the method accesses a setof one or more external storages operating outside of the host computerthrough a shared port of the NIC that is not only used to access the setof external storages but also for forwarding packets not related to theset of external storages or the distributed storage service. In someembodiments, the method accesses the external storage set by using anetwork fabric storage driver that employs a network fabric storageprotocol to access the external storage set.

The method in some embodiments presents the external storage as a localstorage of the host computer to a set of one or more programs executingon the host computer. In some embodiments, the local storage is avirtual disk, while the set of programs are a set of machines (e.g.,virtual machines or containers) executing on the host computer. In someembodiments, the method presents the local storage by using a storageemulation layer (e.g., a virtual disk layer) on the NIC to create alocal storage construct. In some embodiments, the emulated local storage(e.g., the virtual disk) does not represent any storage on the NIC,while in other embodiments, the emulated local storage also representsone or more storages on the NIC.

The method forwards read/write (R/W) requests to the set of externalstorages when receiving R/W requests from the set of programs to thevirtual disk, and provides responses to the R/W requests after receivingresponses from the set of external storages to the forwarded read/writerequests. In some embodiments, the method translates the R/W requestsfrom a first format for the local storage to a second format for the setof external storages before forwarding the requests to the externalstorage through the network fabric storage driver. The method alsotranslates responses to these requests from the second format to thefirst format before providing the responses to an NIC interface of thehost computer in order to provide these responses to the set ofprograms.

In some embodiments, the NIC interface is a PCIe (peripheral componentinterconnect express) interface, and the first format is an NVMe(non-volatile memory express) format. The second format in some of theseembodiments is an NVMeOF (NVME over fabric) format and the networkfabric storage driver is an NVMeOF driver. In other embodiments, thesecond format is a remote DSAN (distributed storage area network) formatand the network fabric storage driver is a remote DSAN driver. The NICin some embodiments includes a general purpose central processing unit(CPU) and a memory that stores a program (e.g., an NIC operating system)for execution by the CPU to access the set of external storages and topresent the set of external storages as a local storage. In someembodiments, the NIC also includes an application specific integratedcircuit (ASIC), which processes packets forwarded to and from the hostcomputer, with at least a portion of this processing including thetranslation of the R/W requests and responses to these requests. TheASIC in some embodiments is a hardware offload unit of the NIC.

In addition to providing an emulation layer that creates and presents anemulated local storage to the set of programs on the host, the method ofsome embodiments has the NIC execute a DSAN service for the localstorage to improve its operation and provide additional features forthis storage. One example of a DSAN service is the vSAN service offeredby VMware, Inc. The features of the DSAN service in some embodimentsinclude (1) data efficiency processes, such as deduplication operations,compression operations, and thin provisioning, (2) security processes,such as end-to-end encryption, and access control operations, (3) dataand life cycle management, such as storage vMotion, snapshot operations,snapshot schedules, cloning, disaster recovery, backup, long termstorage, (4) performance optimizing operations, such as QoS policies(e.g., max and/or min I/O regulating policies), and (5) analyticoperations, such as collecting performance metrics and usage data forvirtual disk (IO, latency, etc.).

These services are highly advantageous for improving performance,resiliency and security of the host's storage access that is facilitatedthrough the NIC. For instance, the set of host programs that access theemulated local storage do not have insight that data is being accessedon remote storages through network communications. Neither theseprograms nor other programs executing on the host in some embodimentsencrypt their storage access, as the storage being accessed appears tobe local to these programs. Hence, it is highly beneficial to use theDSAN services for the R/W requests and responses (e.g., its securityprocesses to encrypt the R/W requests and responses) exchanged betweenthe host and the set of external storages that are made to appear as thelocal storage.

The preceding Summary is intended to serve as a brief introduction tosome embodiments of the invention. It is not meant to be an introductionor overview of all inventive subject matter disclosed in this document.The Detailed Description that follows and the Drawings that are referredto in the Detailed Description will further describe the embodimentsdescribed in the Summary as well as other embodiments. Accordingly, tounderstand all the embodiments described by this document, a full reviewof the Summary, Detailed Description, the Drawings and the Claims isneeded. Moreover, the claimed subject matters are not to be limited bythe illustrative details in the Summary, Detailed Description and theDrawing.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth in the appendedclaims. However, for purpose of explanation, several embodiments of theinvention are set forth in the following figures.

FIG. 1 illustrates one manner of using a smart NIC to emulate a localstorage that represents several external storages to a virtual machineexecuting over a hypervisor of a host computer.

FIG. 2 illustrates examples of adapters emulated by the smart NIC.

FIGS. 3 and 4 illustrate two different ways that a DSAN service on asmart NIC serves as a vSAN node in some embodiments.

FIGS. 5 and 6 illustrate two different ways that the smart NIC of someembodiments uses to translate between the NVMe and NVMeOF storageformats.

FIG. 7 illustrates a VM that executes on a smart NIC to implement thirdparty interface (protocols) that are needed to access a third partyexternal storage and that are not natively supported by the smart NIC orthe host.

FIG. 8 illustrates a process that some embodiments perform to handleegress communication from the host to a third party external storage.

FIG. 9 illustrates a process that some embodiments perform to handleingress communication from the third party external storage to the host.

FIG. 10 illustrates a smart NIC emulating a local storage using anexternal storage and a hardware offload unit driver.

FIG. 11 illustrates a process that the smart NIC OS performs in someembodiments to process an egress communication from the host to anexternal storage for the example illustrated in FIG. 10.

FIG. 12 illustrates a process that the smart NIC OS performs in someembodiments to process an ingress packet from an external storage to thehost.

FIG. 13 illustrate one example of a smart NIC that is used with a hostto perform storage emulation.

FIG. 14 illustrates a process performed to process an egress NVMecommand by the smart NIC of FIG. 13.

FIG. 15 illustrates another example of a smart NIC that is used with ahost to perform storage emulation.

FIG. 16 illustrates a process that is performed to process egresspackets from the VM.

FIG. 17 illustrates a system including a host computer and a connectedsmart NIC being configured with a host computer virtualization programand a smart NIC operating system.

FIG. 18 conceptually illustrates a process for installing programsenabling resource sharing on a host computer and smart NIC.

FIG. 19 conceptually illustrates a process that, in some embodiments,represents sub-operations of an operation described in relation to FIG.18.

FIG. 20 illustrates a simplified view of a host computer including abaseboard management controller (BMC) and connecting to the smart NICthrough a PCIe bus.

FIG. 21 conceptually illustrates a process that is performed by thesmart NIC to install the smart NIC operating system as part of theprocess described in FIG. 19.

FIG. 22 illustrates a smart NIC after the installation is complete withits storage partitioned into a first partition storing the smart NICoperating system and a second partition.

FIG. 23 illustrates a system that includes the host computer, the smartNIC, a set of SDN controller computers, and a set of SDN managercomputers.

FIG. 24 illustrates a host computer executing a host computer hypervisorand a set of compute nodes (CN₁-CN_(M)) for a first tenant (“T1”) and aset of compute nodes (CN_(a)-CN_(x)) for a second tenant (“T2”).

FIG. 25 illustrates a smart NIC providing compute virtualization andnetwork virtualization to provide virtualized resources (e.g., computenodes, physical functions, and a set of virtual functions) to be used bycompute nodes executing on a host computer.

FIG. 26 illustrates an interaction between an I/O ASIC, a virtualswitch, and a fast path entry generator, in some embodiments.

FIG. 27 illustrates a system including a smart NIC and a set of hostcomputers connected to the smart NIC through two different PCIe buses.

FIG. 28 conceptually illustrates an electronic system with which someembodiments of the invention are implemented.

DETAILED DESCRIPTION

In the following detailed description of the invention, numerousdetails, examples, and embodiments of the invention are set forth anddescribed. However, it will be clear and apparent to one skilled in theart that the invention is not limited to the embodiments set forth andthat the invention may be practiced without some of the specific detailsand examples discussed.

Some embodiments of the invention provide a method of providingdistributed storage services to a host computer from a network interfacecard (NIC) of the host computer. At the NIC, the method accesses a setof one or more external storages operating outside of the host computerthrough a shared port of the NIC that is not only used to access the setof external storages but also for forwarding packets not related to theset of external storages or the distributed storage service. The NICsare sometimes referred to herein as smart NICs as they perform multipletypes of services and operations. In some embodiments, the methodaccesses the external storage set by using a network fabric storagedriver that employs a network fabric storage protocol (e.g., NVMeOF) toaccess the external storage set.

The method presents the external storage as a local storage of the hostcomputer to a set of programs executing on the host computer. In someembodiments, the local storage is a virtual disk, while the set ofprograms are a set of machines (e.g., virtual machines or containers)executing on the host computer. In some embodiments, the method presentsthe local storage by using a storage emulation layer (e.g., a virtualdisk layer) to create a local storage construct that presents the set ofexternal storages as a local storage of the host computer. In someembodiments, the emulated local storage (e.g., the virtual disk) doesnot represent any storage on the NIC, while in other embodiments, theemulated local storage also represents one or more storages on the NIC.

The method forwards read/write (R/W) requests to the set of externalstorages when receiving R/W requests from the set of programs to thevirtual disk, and provides responses to the R/W requests after receivingresponses from the set of external storages to the forwarded read/writerequests. In some embodiments, the method translates the R/W requestsfrom a first format for the local storage to a second format for the setof external storages before forwarding the requests to the externalstorage through the network fabric storage driver. The method alsotranslates responses to these requests from the second format to thefirst format before providing the responses to a NIC interface of thehost computer in order to provide these responses to the set ofprograms.

In some embodiments, the NIC interface is a PCIe interface, and thefirst format is an NVMe format. The second format in some of theseembodiments is an NVMeOF format and the network fabric storage driver isan NVMeOF driver. The NIC in some embodiments includes a general purposecentral processing unit (CPU) and a memory that stores a program (e.g.,an NIC operating system) for execution by the CPU to access the set ofexternal storages and to present the set of external storages as a localstorage. The NIC in some embodiments is implemented as a system on chip(SoC) with multiple other circuit components. For instance, in someembodiments, the NIC also includes an application specific integratedcircuit (ASIC), which processes packets forwarded to and from the hostcomputer, with at least a portion of this processing including thetranslation of the R/W requests and responses to these requests. ThisASIC in some embodiments is a hardware offload unit (HOU) of the NIC,and performs special operations (e.g., packet processing operations,response/request reformatting operations, etc.).

In addition to providing an emulation layer that creates and presents anemulated local storage to the set of programs on the host, the method ofsome embodiments has the NIC execute a distributed storage area network(DSAN) service for the local storage to improve its operation andprovide additional features for this storage. One example of a DSANservice is the vSAN service offered by VMware, Inc.

The DSAN services are highly advantageous for improving performance,resiliency and security of the host's storage access that is facilitatedthrough the NIC. For instance, the set of host programs that accessesthe emulated local storage does not have insight that data is beingaccessed on remote storages through network communications. Neitherthese programs nor other programs executing on the host in someembodiments encrypt their storage access, as the storage being accessedappears to be local to these programs. Hence, it is highly beneficial touse the DSAN services for the R/W requests and responses (e.g., itssecurity processes to encrypt the R/W requests and responses) exchangedbetween the host and the set of external storages that are made toappear as the local storage.

Although the description of some embodiments refers to emulations ofNVMe storage and NVMe storage protocol, in other embodiments otherstorage protocols may be emulated instead of or in addition to NVMestorages. Similarly, although the description refers to PCIe buses, inother embodiments, other system buses are used instead of or in additionto a PCIe bus. Although certain drivers and protocols are shown as beingused by external storages in various embodiments, other embodiments useother drivers or protocols for external storage. The smart NICsdescribed herein are described as having operating software. In someembodiments, this operating software is an operating system that hasdirect control over the smart NIC without an intervening program orhypervisor. In other embodiments, the operating software is a hypervisorthat runs on top of another operating system of the smart NIC. Stillother embodiments use just a hypervisor and no other operating system onthe smart NIC.

FIG. 1 illustrates one manner of using a smart NIC to emulate a localstorage 160 that represents several external storages 140 to one or morevirtual machines 112 executing over the operating system (OS) 100 of ahost computer. One example of such a machine is illustrated as a virtualmachine (VM) 112, which operates over a hypervisor 114 executing on thehost OS 100. The host computer has a set of processors that execute itsOS, hypervisor and VM. This computer also includes a smart NIC that hasa set of processors and a set of hardware offload units that assist inthe operation of the host computer. Specifically, in addition toperforming traditional NIC operations to forward packets to and from thehost computer (e.g., between the machines executing on the host computerand machines executing on other host computers), the smart NIC performsstorage emulation operations that represent multiple external storages140 as the local storage 160 to the machines executing on the hostcomputer. The smart NIC connects to PCIe bus 150 of the host.

The smart NIC in some embodiments is a system on chip (SoC) with a CPU,FPGA, memory, IO controller, a physical NIC, and other hardwarecomponents. The smart NIC has an operating system (OS) 120 that includesan NVMe driver 122 and a series of storage processing layers 124-127.The discussion below collectively refers to the software executing onthe smart NIC as the smart NIC OS 120. However, in some embodiments, thesmart NIC OS is a hypervisor, while in other embodiments a hypervisorexecutes on top of the smart NIC OS and some or all of the storageprocessing layers are part of this hypervisor. In the discussion below,the components that are attributed to the smart NIC OS 120 arecomponents of the hypervisor 114 that serves as the smart NIC OS orexecutes on top of the smart NIC OS in some embodiments. In otherembodiments, these are components of a smart NIC OS that is not ahypervisor. In still other embodiments, some of these components belongto the smart NIC OS, while other components belong to the hypervisorexecuting on the smart NIC OS.

The NVMe driver 122 is a driver for the PCIe bus 150. This driver relaysNVMe formatted R/W requests from the host hypervisor 114 to the storageprocessing layers, and relays responses to these requests from thestorage processing layers to the host hypervisor 114. The storageprocessing layers include an NVMeOF driver 124, a core storage service125, a DSAN service 126, and a virtual device service 127. The virtualdevice service includes an NVMe emulator 128.

The smart NIC OS 120 uses the NVMeOF driver 124 in some embodiments toaccess one or more external storages 140. Specifically, the smart NIC OS120 emulates a local NVMe storage 160 to represent several externalstorages 140 to the machines (e.g., VM 112) executing on the host. Fromthe host point of view, the VM 112 operates on the emulated localstorage 160 as if it was a local NVMe storage connected through the PCIebus 150.

To access the external storages 140, the smart NIC (e.g., the NVMeOFdriver) uses one or more of its shared ports 130. The shared ports arenot only used for the purposes of accessing external storage 140, butare also used for other purposes as well (e.g., used to forward packetsto and from destinations other than the external storages). The NVMeOFdriver 124 handles the NVMeOF protocols needed for communicating withthe external storages 140 through network fabric (e.g., throughrouters).

The smart NICs illustrated in FIG. 1 as well as in other figures performoperations other than storage emulation. For instance, the smart NICsperform regular packet processing in order to forward packets to andfrom other destinations outside of the smart NIC's host computer thatare not external storages. Examples of such other destinations includemachines executing on other host computers. However, the illustrationpresented in FIG. 1 and the other figures focus on the components of thesmart NIC that facilitate the storage emulation operations in order notto obscure the description of some embodiments with unnecessary detail.

The core storage service 125 provides one or more core storageoperations. One example of such operations are adapter services thatallow the smart NIC to emulate one or more storage adapters, with eachadapter logically connecting to one or more external storages 140 andfacilitating a different communication mechanism (e.g., transportmechanism) for communicating with the external storages. FIG. 2illustrates examples of such adapters. In this example, four adaptersare illustrated. These include an RDMA storage adapter, a TCP storageadapter, an iSCSI adapter, and an iSER adapter.

Through this interface, an administrator in some embodiments can specifyone or more adapters to use to access an external storage, or a set oftwo or more external storages. In some embodiments, more than oneadapter is specified for an external storage when the administratorwants to specify a multipath pluggable storage architecture (PSA)approach to accessing the storage. Once the administrator specifies anadapter, a network manager that provides the interface sends thedefinition of the specified adapter to a network controller, which thenconfigures the smart NIC to implement and configure a new driver, orreconfigure an existing driver, to access the external storage accordingto the adapter's specified definition. Different methods for configuringa smart NIC in some embodiments are described below.

The DSAN service 126 provides one or more DSAN operations to improve theoperation of the emulated local storage 160 and provide additionalfeatures for this storage. These operations are performed as theemulated local storage is not really local but rather an emulation ofone or more external storages. As such, the DSAN service 126 addressesone or more things that can go wrong in accessing such a virtual “local”storage.

For instance, in some embodiments, the DSAN service provides dataresiliency and I/O control that are not generally needed when a hostmachine is accessing a physical local storage over NVMe. A local driveis not subject to interception over a network and is not prone to packetduplication in the manner of packets sent over a network. These issuesarise from emulating the local storage using external storage accessedover a network, therefore the DSAN layer 126 resolves such issues beforethe data is presented to the higher layers.

In some embodiments, the DSAN operations include (1) data efficiencyprocesses, such as deduplication operations, compression operations, andthin provisioning. (2) security processes, such as end-to-endencryption, and access control operations, (3) data and life cyclemanagement, such as storage vMotion, snapshot operations, snapshotschedules, cloning, disaster recovery, backup, long term storage, (4)performance optimizing operations, such as QoS policies (e.g., maxand/or min I/O regulating policies), and (5) analytic operations, suchas collecting performance metrics and usage data for virtual disk (10,latency, etc.).

One example of a DSAN service 126 is the vSAN service offered by VMware,Inc. In some such embodiments, the smart NIC includes a local physicalstorage that can serve as a vSAN storage node. In other embodiments, thesmart NIC does not have a local physical storage, or has such a storagebut this data storage cannot participate as a vSAN storage node. In suchembodiments, the smart NIC serves as a remote vSAN client node, and itsvSAN layer is a vSAN proxy that uses one or more remote vSAN nodes thatperform some or all of the vSAN operations and then direct the vSANproxy what to do.

FIG. 3 illustrates such an approach. As shown, the vSAN proxy 326 uses aremote vSAN client protocol to communicate with the other vSAN nodes305, which direct the vSAN operations of the vSAN proxy. The vSAN nodes305 provide some or all of the external storages 140 in some embodimentsof the invention. In this example, the network storage driver 324 is aniSCSI driver, although other network storage drivers are used in otherembodiments.

In other embodiments, the DSAN service of the smart NIC does not use aremote vSAN client protocol to communicate with the other vSAN nodes.For instance, as shown in FIG. 4, a DSAN service 126 in some embodimentsuses a vSAN over NVMeOF protocol 426 to communicate with the other vSANnodes. This protocol is defined in some embodiments to allow the smartNIC to be a vSAN node that does not have a local physical storage, orhas such a storage but this data storage cannot participate as a vSANstorage node. In some embodiments, the emulated local storage 160 (thatis defined by one or more external storages 140 through emulationoperations of the NVMe emulator 128 of the virtual device service 127 ofthe smart NIC OS 120) serves as the local storage that allows the smartNIC to be a vSAN node.

The virtual device service 127 has an NVMe emulator 128 that emulatesthe local NVMe storage 160 to represent the set of external storages 140that are accessed through the NVMeOF driver 124 and the interveningnetwork. As part of this emulation, the virtual device layer 127 mapsoutgoing NVMe access commands to external storage access commands, andthe incoming external storage responses to an NVMe memory response. Whenmultiple external storages are used, this mapping involves mappingbetween a storage location in the emulated local storage 160 and astorage location in one or more external storages 140. One example of avirtual device emulator that can be used for the NVMe emulator is thevirtual device emulator of the vSphere software of VMware, Inc.

Part of the NVMe emulator's operation also involves this emulator usingthe hardware offload unit (e.g., an ASIC) of the smart NIC to convertthe NVMe access commands from an NVMe-PCIe format to an NVMe format, andto convert the external storage responses received at the emulator 128from the NVMe format to an NVMe-PCIe format (e.g., to remove PCIe headerinformation from outgoing commands, and to add PCIe header informationto incoming responses). This is further described below by reference toFIGS. 5 and 6.

The host OS 100, the hypervisor 114 or the VM 112 in some embodimentshave their own drivers (not shown) for sending and receiving datathrough the PCIe bus 150. The host OS 100, the hypervisor 114 or the VM112 treats the virtual local storage 160 as a physical local storage,without having to deal with the operations that the smart NIC performsto send data to and receive data from the set of external storages 140.

DSAN services 126 (such as the remote DSAN client of FIG. 3 or the vSANover NVMeOF of FIG. 4) are two ways of offering disaggregated storageservices. Today, many DSANs (e.g., VMware's vSAN architecture) are partof a hyper-converged solution, in which each vSAN node offers bothstorage and compute functionality. As illustrated by FIGS. 3 and 4,disaggregated storage in some embodiments refers to storage in a systemwhich has some DSAN nodes (e.g., some hardware boxes) that provide onlycompute functionality and no storage functionality. In some embodiments,one or more DSAN nodes only offer storage functionality and no computefunctionality. Such a disaggregated system allows more flexibility indatacenters by allowing the operators of the datacenters to add morestorage boxes than compute boxes or more compute boxes than storageboxes, whichever is necessary, rather than adding additional computeboxes with storages whenever additional capacity of only one of thoseresources is necessary.

FIGS. 5 and 6 illustrate two different ways that the smart NIC of someembodiments uses to translate between the NVMe and NVMe-PCIe formats(e.g., to remove PCIe header from outgoing storage access commands andto add PCIe header information to incoming storage responses). Both ofthese techniques use a hardware offload unit (HOU) 505 of the smart NICto perform these operations. This HOU is an ASIC that has multiplepacket processing stages that can be configured to remove or add PCIeheaders to storage commands and responses to and from the externalstorages. In both approaches illustrated in FIGS. 5 and 6, the NVMeemulator 128 uses an HOU interface 520 to communicate with the HOU 505.

In FIG. 5, the HOU interface executes on a VM 510 that executes on thesmart NIC. The smart NIC OS is a hypervisor and the VM 510 executes ontop of this hypervisor in some embodiments. As shown, the NVMe emulator528 of the virtual device layer 527 communicates with the HOU interface520 to forward storage access commands and responses for processing bythe HOU 505 and to receive processed commands and responses from the HOU505. In other embodiments, the smart NIC executes the HOU interface onmachines (e.g., Pods or containers) other than VMs. One example of anHOU interface and the HOU are the Snap software and hardware offered byNvidia, Inc. In some embodiments, the HOU Snap software operates on theVM as it requires a different OS (e.g., require Ubuntu) than the smartNIC OS (which might be ESX offered by VMware, Inc.).

In some embodiments, a smart NIC is able to employ HOU drivers that areadapted to the smart NIC OS (e.g., HOU drivers supplied along with thesmart NIC operating software or subsequently downloaded, etc.) as theinterface with the smart NIC HOU. The HOU drivers that are adapted torun directly on a particular type of operating software are referred toas being “native” to that operating software. In FIG. 6, the HOUinterface 520 is implemented as a native HOU driver 610 of the smartNIC. This approach works when the driver is available natively for thesmart NIC OS. Otherwise, the driver has to operate in a VM 510 as inFIG. 5.

More generally, a VM is used by the smart NIC of some embodiments toperform other processes and/or support other protocols that are notnatively supported by the smart NIC in some embodiments. For instance,FIG. 7 illustrates a VM 710 that executes on a smart NIC OS 700 toimplement third party interface 725 (e.g., third party storage protocol)that is needed to access a third party external storage 712 and that isnot natively provided by the smart NIC OS or the host OS. In thisexample, the third party storage interface 725 is part of an interface520 for a HOU 715 of the smartNIC.

At the direction of the HOU interface 520 (also called the HOU handler),the HOU 715 performs storage command and response processing operationsneeded to implement the third party storage protocol and to convertbetween the command and response formats of the host's local storage(e.g., its NVMe local storage) and the third party external storage 712.As shown, the third party storage interface 725 passes storage accesscommands and receives storage access responses from a shared port 720 ofthe NIC.

FIG. 8 illustrates a process 800 that some embodiments perform to handleegress communication from the host to a third party external storage. Asshown, the process 800 starts (at 805) when a workload VM or anapplication running on the host generates an NVMe command (with data).At 810, the NVMe command is then encapsulated into a PCI-NVME command(i.e., encapsulated with a PCIe header) at a local storage controller ofthe host computer, and is forwarded along the PCIe bus to the smart NIC700. At the smart NIC 700, the PCI-NVMe command is passed (at 815) tothe HOU handler 520 running inside of the VM 710.

Next, at 820, the third party storage interface 725 strips off the PCIHeaders and passes NVMe command back to the HOU handler 520. To do this,the third party storage interface 725 uses the HOU in some embodiments.The HOU handler next uses (at 825) the smart NIC HOU to change theformat of the NVMe command to a command that comports with the thirdparty storage 712, and passes (at 830) this command to the third partystorage 712 along a shared port of the smart NIC. In some embodiments,the command is passed to the third party storage 712 as one or morepackets transmitted through the network fabric.

FIG. 9 illustrates a process 900 that some embodiments perform to handleingress communication from the third party external storage to the host.As shown, the process 900 starts (at 905) when it gets a storage-accessresponse (e.g., a Read response) through a shared port of the NIC fromthe third party external storage 712. At 910, the smart NIC OSdetermines that the storage-access response is from a third partyexternal storage that needs to be processed by the third party storageinterface 725 of the HOU handler 520.

At 915, the HOU Interface 520 gets the storage-access response andprovides it to the third party storage interface 725, which thenconverts (at 920) the storage-access response from a third party formatto an NVMe format and passes the storage-access response back to the HOUinterface 520. Next, at 925, the HOU interface encapsulates the NVMestorage-access response with a PCIe header, and is passed to the host'slocal storage controller along the PCIe bus 150. The local storagecontroller then removes (at 930) the PCIe header, and provides the NVMestorage-access response to a workload VM or an application running onthe host.

As described with respect to FIG. 1, the smart NICs of some embodimentsprovide a DSAN service to perform various security and efficiencyoperations for the virtual local storage that is emulated with one ormore external storages. However, the smart NIC in other embodimentsbypasses the DSAN layer that performs DSAN operations in order toincrease the speed of data transfer. Instead of the HOU driversdescribed above, some such embodiments use other protocols. For example,some embodiments use HOU upper level protocols (ULP). Upper levelprotocols (e.g., IPoIB, SRP, SDP, iSER, etc.) facilitate standard datanetworking, storage and file system applications to operate overInfiniBand.

FIG. 10 illustrates a smart NIC emulating a local storage for the VMs112 of a host 1000 by using an external storage 1040 and a HOU driver1022. Like the above-described HOU interfaces and drivers, the HOUdriver 1022 forwards data messages for processing by the smart NIC HOU505 and receives processed data messages from the HOU 505. Specifically,the HOU driver 1022 uses the HOU 505 to perform the packet processingneeded to convert between the data message NVMe formats and the NVMePCIe formats. In this example, the HOU driver 1022 exchanges datamessages with a kernel NVMe layer 1028, which exchanges data messageswith an NVMe RDMA driver 1024 and/or an NVMe TCP driver 1026. The NVMeRDMA and TCP drivers send and receive data messages to and from externalstorage 1040 through an intervening network fabric (e.g., interveningrouters and switches).

One advantage of the approach of FIG. 10 is that the smart NIC 1020transfers data quickly between the host and the external storage 1040that is used to emulate the host's local storage 1020. This transfer isfast because it uses the kernel NVMe 1028 as a bridge and it does notuse a DSAN layer 1030 on the smart NIC OS 1020. This embodiment can tapinto NVMe RDMA offload capability by using the NVMe RDMA driver 1024. Insome embodiments, the HOU of the smart NIC 1020 can strip Ethernetheaders from an incoming data packet, identify the particular NVMe PCIecontroller (here, the HOU ULP driver 1022) that needs to receive thepacket, and pass the packet to that NVMe PCIe controller. Thus, thesmart NIC CPU cost of bridging through the kernel NVMe layer 1028 isminimal. This speed comes at a cost of other features, such as bypassingthe DSAN service 1030 which provides useful security and performanceoperations for the emulated local storage.

In the example of FIG. 10, the DSAN module 1056, the virtual deviceemulator 1057 and the multipath PSA service 1055 are provided for one ormore VMs 112 through the host hypervisor 114. Specifically, in thisexample, a multipath PSA layer 1055 exists between the VMs 112 executingon the host OS 1000 and the NVMe PCIe driver 1060 of the OS. Throughthis PSA layer 1055, the host can use multiple paths to the sameexternal storage by using different NVMe PCIe drivers executing on thehost OS 1000 (although only one NVMe PCIe driver 1060 is shown in FIG.10). In other words, for the multi-pathing, different PCIe drivers arealso used in some embodiments to access the same external storagethrough different paths. Also, in some embodiments, the different NVMePCIe drivers are used to emulate different local storages from differentexternal storages 1040.

The virtual device emulator 1057 is used to emulate a local virtual diskfrom several external storages 1040 for one or more VMs 112. Asmentioned above, the vSphere software's virtual device layer is used toimplement the virtual device emulator of the host hypervisor or smartNIC hypervisor in some embodiments. In some embodiments, the same ordifferent PCIe drivers 1060 are used to access different externalstorages 1040 that are used to emulate one virtual disk. The DSAN module1056 performs DSAN services like those described above for the emulatedlocal storages.

In some embodiments, the host hypervisor and smart NIC hypervisor can beconfigured to provide different storage services for different workloadVMs 112. For instance, the storage access commands and responses for oneworkload VM is processed by the storage services 1055-57, while thestorage access commands and responses for another workload VM skip thesestorage services. Similarly, the storage access commands and responsesof one workload VM is processed by the storage services 125-127 of thesmart NIC as shown in FIG. 1, while the storage access commands andresponses of another workload VM are just processed by the kernel NVMemodule 1028 and NVMeOF drivers 1024 and 1026 of FIG. 10.

FIG. 11 illustrates a process 1100 that the smart NIC OS 1020 performsin some embodiments to process an egress communication from the host toan external storage for the example illustrated in FIG. 10. As shown,the process starts (at 1105) when an NVMe command (with data) isgenerated by a VM 112 on the host. This packet (at 1110) is encapsulatedwith PCIe header information to produce a PCIe-NVMe command (with data)at a local storage controller (not shown) of the host, and is passedalong to the PCIe bus 150. Next, at 1115, the HOU driver 1022 (e.g., HOUULP driver) receives this command through the PCIe bus 150, and uses theHOU to strip out the PCI headers and produce the NVMe command (withdata).

At 1120, the HOU driver 1022 passes the NVMe command to the kernel NVMemodule 1028, which maps this packet to an NVMeOF transport controller.The kernel NVMe module 1028 in some embodiments is transport agnostic,and can be configured to use any one of a number of different NVMetransport drivers. At 1120, the kernel NVMe 1028 identifies the NVMeOFcontroller (i.e., NVMe RDMA controller 1024 or NVMe TCP controller 1026)that needs to receive this NVMe command. This identification is based onthe NVMe command parameters that identify the transport protocol to use.These command parameters are provided by the host's multipath PSA layer1055.

The kernel module (at 1125) passes the NVMe command to the identifiedNVMeOF controller, which then generates one or more NVMeOF packets toforward (at 1130) the NVMe command to the destination external storagethrough a shared port of the smart NIC. As mentioned above, both NVMeRDMA 1024 and NVMe TCP 1026 are provided by the smart NIC OS 1020 foraccessing remote external storages 1040 through the shared port(s) 130of the smart NIC. In some embodiments, the kernel NVMe 1028 works like amultiplexer that provides NVMe storage access to the HOU driver 1022using different transports, such as NVMe RDMA 1024 and NVMe TCP 1026, atthe same time. After 1130, the process 1100 ends.

FIG. 12 illustrates a process 1200 that the smart NIC OS 1020 performsin some embodiments to process an ingress packet from an externalstorage to the host. As shown, the process starts (at 1205) when anexternal storage 1040 generates and forwards an NVMeOF command (withdata) that is received as a set of one or more network packets at ashared port of the smart NIC through network fabric (e.g., through oneor more switches and/or routers). The port (at 1210) passes the receivedpacket to the NVMe RDMA controller 1024 or NVMe TCP controller 1026depending on the transport protocol used by the external storage. TheNVMe controller (at 1215) receives the NVMeOF packet in its transportspecific format, removes the transport header data, and provides an NVMecommand (with data) to the kernel NVMe 1028.

At 1220, the kernel NVMe 1028 maps the received NVMe command to the HOUdriver 1022 as the NVMe command needs to go to host. In someembodiments, the kernel NVMe 1028 creates a record when it wasprocessing an egress packet at 1125 and uses this record to perform itsmapping at 1220. In some embodiments, the kernel NVMe 1028 provides theNVMe command to the HOU driver 1022 with the controller of the emulatedlocal storage 160 as the command's destination. At 1225, the HOU driver1022 then encapsulates the NVMe command with a PCIe header by using thesmart NIC's HOU and then sends the NVMe command along the host PCIe tothe local storage controller of the emulated local storage 160. The hostPCIe then provides (at 1230) the NVMe command to the local storagecontroller through the NVMe PCIe driver 1060. This controller thenremoves (at 1230) the PCIe header and provides the NVMe command to thedestination VM 112. The process 1200 then ends.

In some embodiments, the smart NICs are used as storage accessaccelerators. FIGS. 13 and 15 illustrate two such examples. FIG. 13illustrates how a smart NIC serves as a network accelerator to one ormore workload VMs 1312 executing over a host hypervisor 1314 thatoperates over a host OS 1300. In this example, the remote storageservices protocol is running inside the host and the smart NIC OS 1320just runs network accelerators. Also, the host hypervisor 1314 providesemulation services in this example that allow it to present one or moreexternal storages 1340 as a local storage to a VM 1312. In someembodiments, the hypervisor 1314 is the ESX hypervisor of VMware, Inc.In some such embodiments, a virtual NVMe device emulation module 1311 ofthe VMware vSphere software provides the NVMe device emulation thatpresents multiple external storages 1340 as a single local NVMe storageto the VM 1312.

In some embodiments, the hypervisor 1314 also includes the DSAN servicelayer 1313, which provide distributed storage services for the emulatedlocal NVMe storage. As mentioned above, the distributed storage servicesin some embodiments account for the VM 1312 having no knowledgeregarding the plurality of external storages being used to emulate thelocal storage. These DSAN service improve this emulated storage'soperation and provide additional features for it. Examples of suchfeatures in some embodiments include (1) data efficiency processes, suchas deduplication operations, compression operations, and thinprovisioning, (2) security processes, such as end-to-end encryption, andaccess control operations, (3) data and life cycle management, such asstorage vMotion, snapshot operations, snapshot schedules, cloning,disaster recovery, backup, long term storage, (4) performance optimizingoperations, such as QoS policies (e.g., max and/or min I/O regulatingpolicies), and (5) analytic operations, such as collecting performancemetrics and usage data for virtual disk (10, latency, etc.) One exampleof a DSAN service is the vSAN service offered by VMware vSpheresoftware. The DSAN service layer 1313 also includes a multipathing PSAlayer in some embodiments.

The DSAN service module 1313 receives and sends storage related NVMecommands from and to the kernel NVMe module 1315. The kernel NVMe module1315 interacts with either the NVMe RDMA driver 1316 or NVMe TCP driver1317 to receive and send these NVMe commands. These drivers exchangethese NVMe commands with the smart NIC OS 1320 through one or morevirtual functions (VFs) 1322 defined for these drivers on the smart NICOS.

In some embodiments, the smart NIC OS can present the smart NIC asmultiple physical functions (PF) connected to the host computer. ThePCIe bus 150, in some embodiments, allows for the creation of these PFs.A PF, in some embodiments, can be further virtualized as multiplevirtual functions (VFs). More specifically, in some embodiments,physical functions and virtual functions refer to ports exposed by asmart NIC using a PCIe interface to connect to the host computer overthe PCIe bus. A PF refers to an interface of the smart NIC that isrecognized as a unique resource with a separately configurable PCIeinterface (e.g., separate from other PFs on a same smart NIC). In someembodiments, each PF is executed by the processing units (e.g.,microprocessors) of the host computer.

The VF refers to a virtualized interface that is not fully configurableas a separate PCIe resource, but instead inherits some configurationfrom the PF with which it is associated while presenting a simplifiedconfiguration space. VFs are provided, in some embodiments, to provide apassthrough mechanism that allows compute nodes executing on a hostcomputer to receive data messages from the smart NIC without traversinga virtual switch of the host computer. The VFs, in some embodiments, areprovided by virtualization software executing on the smart NIC. In someembodiments, each VF is executed by the processing units (e.g.,microprocessors) of the smart NIC.

The VFs and PFs, in some embodiments, are deployed to support storageand compute virtualization modules. For example, a PF or VF can bedeployed to present a storage or compute resource provided by the smartNIC as a local device (i.e., a device connected to the host computer bya PCIe bus). Defining such VFs are further described below.

The PF 1370 on the host has the corresponding VF 1322 on the smart NIC.The PF 1370 represents a shared NIC port to the NVMeOF drivers 1316 and1317, which run on the host and convert the NVMe storage access commandsto network packets. These drivers use this representative port 1370 toforward storage access packets to an external storage through the VF1322 of the smart NIC 1320, and to receive storage access responsepackets from the external storage 1340 through the VF 1322 of the smartNIC 1320.

When the VF 1322 does not know how to process a packet (e.g., when itreceives a first packet of a new flow for which it does not have aforwarding rule), the VF passes the packet through a “slow-path” thatincludes the virtual switch 1326 of the virtualization layer 1327, whichthen determines how to forward the packet and provides the VF withforwarding rule for forwarding the packet. On the other hand, when theVF 1322 knows how to process a packet (e.g., when the VF receivesanother packet of a flow that it has previously processed and/or forwhich it has a forwarding rule), the VF passes the packet through a“fast-path,” e.g., passes a packet of a previously processed flowdirectly to the NIC driver 1325 for forwarding to an external storage1340. Accordingly, in the example illustrated in FIG. 13, the VF 1322 isa network accelerator that facilitates the forwarding of the packetsrelated to the external storages.

In some embodiments, the VF 1322 uses the smart NIC HOU 505 to performits fast path forwarding. When the HOU is not programmed withflow-processing rules needed to process a new flow, the VF 1322 in someembodiments passes the packet to the virtualization layer 1327, whicheither identifies the flow-processing rule for a rule cache or passesthe packet to a manager (executing on the smart NIC or on an externalcomputer) that then determines the flow processing rule, and passes thisrule back to the virtualization layer to use to forward the packet andto program the HOU. Once programmed, the VF can use the HOU to processsubsequent packets of this flow.

FIG. 14 illustrates a process 1400 performed to process an egress NVMecommand by the smart NIC 1320 of FIG. 13. In this example, the VM 1312is presented an NVMe device through a virtual NVMe device emulationprovided by the hypervisor 1314 (e.g., provided by a virtual NVMe deviceemulation module of the vSphere software of VMware Inc.). The NVMedevice present in VM 1312 generates (at 1405) an NVMe command (withdata). The VM's NVMe driver passes (at 1410) this NVMe command throughthe virtual device layer 1311 and the DSAN service layer 1313, to thekernel NVMe module 1315. At 1415, the kernel NVMe module 1315 identifiesthe NVMeOF controller that needs to process this NVMe command, andprovides the packet to this controller 1316 or 1317.

The NVMEoRDMA 1316 or NVMEoTCP 1317 module running on the host (at 1420)converts the NVMe command to one or more NVMe network packets (NVMeOFpackets) and passes the packets to a PF 1370 of the PCIe bus 150. At1425, the PF 1370 adds PCIe header information to the NVMe networkpackets, and then passes the packets along the PCIe bus 150. The PCIebus 150 creates a mapping between the PF 1370 and the VF module 1322running on the smart NIC. Hence, the VF module 1322 receives each NVMeOFpacket through the PCIe bus 150.

At 1430, the VFI module 1322 then transfers the NVMeOF packet eitherdirectly through the fast path to the NIC driver 1325, or indirectly tothe NIC driver 1325 through the slow path that involves the virtualswitch 1326. The NIC driver 1325 then forwards the NVMeOF packet througha shared port of the smart NIC, so that this packet can be forwardedthrough intervening network fabric (e.g., intervening switches/routers)to reach its destination external storage 1340. In some embodiments, thefast-path processing of the VF 1322 allows the VF to directly pass thepacket to the shared port of the smart NIC. The process then ends.

FIG. 15 illustrates another way of using the smart NIC as a networkaccelerator in some embodiments. In this example, one VM 1512 executesthe NVMeOF driver, so that it cannot only bypass the DSAN service layer1313, but also the kernel NVMe 1315 and NVMeOF drivers 1316-17 of thehost hypervisor 1514 that executes over a host OS 1500. This approachprovides the fastest access for a VM to one or more external storagesthrough the VM's NVMeOF driver, which in this example is a GOS NVMefabric driver. However, in this approach, no local storage is emulatedfor the VM 1514 by either the host or the smart NIC. This VM simplyaccesses the external storages through its NVMeOF driver. Specifically,in the example of FIG. 15, the GOS NVMeOF driver inside the VM 1512presents the NVMe device to the VM 1512. Also, a PF 1580 is directlyassigned to VM 1512 using a passthrough mode, such as SRIOV Mode or SIOVMode.

For the PF 1580, the smart NIC OS in FIG. 15 defines a VF 1523 toprocess the packets associated with the VM 1514. In both FIGS. 13 and15, the smart NIC OS has a virtual switch 1326 to perform softwareswitching operations and a network virtualization 1327 layer to performnetwork virtualization operations for the smart NIC. In someembodiments, these operations are analogous to the operations thattraditionally have been performed on host computers to provide softwareswitching and network virtualization operations. The smart NIC OS 1320also has a NIC driver 1325 to communicate with the external storages1340 through one or more ports of the smart NIC.

FIG. 16 illustrates a process 1600 that is performed to process egresspackets from the VM 1512. As shown, the process starts (at 1605) when anapplication running on the VM 1512 generates an NVMe command (withdata), and provides this command to the NVMeOF driver executing on thisVM. This driver then coverts (at 1610) the NVMe command to a set of oneor more network packets, which it then provides to the PF 1580 directly.

The PF 1580 provides (at 1615) the set of network packets that containsthe NVMe command (with data) to the VF2 1523, which is a high speednetwork adapter provided by the smart NIC 1320. As described above forVF 1322 and operation 1430 of FIG. 14, the VF2 1523 (at 1620) transfersthe set of network packets (containing the NVMe command/data) throughthe direct fast path or the indirect slow path, to a shared NIC port forforwarding to an external storage 1340. In some embodiments, the sharedNIC port can be used by both VFs 1322 and 1523 as well as other modulesof the smart NIC for other forwarding operations.

The smart NIC operating system in some embodiments is provided with thehost-computer hypervisor program as part of a single downloaded package.For instance, some embodiments provide a method for provisioning a smartNIC with a smart NIC operating system for enabling resource sharing onthe smart NIC connected to a host computer. The method, in someembodiments, is performed by the host computer and begins when the hostcomputer receives (1) a host-computer hypervisor program for enablingresource sharing on the host computer and (2) the smart NIC operatingsystem. In some embodiments, the host-computer hypervisor programincludes the smart NIC hypervisor program. The host computer theninstalls the host-computer hypervisor program and provides the smart NICoperating system to the smart NIC for the smart NIC to install on thesmart NIC. One of ordinary skill in the art will appreciate that ahypervisor program is used as an example of virtualization software(e.g., software enabling resource sharing for a device executing thesoftware).

The smart NIC, in some embodiments, is a NIC that includes (i) anapplication-specific integrated circuit (ASIC), (ii) a general purposecentral processing unit (CPU), and (iii) memory. The ASIC, in someembodiments, is an I/O ASIC that handles the processing of packetsforwarded to and from the computer and is at least partly controlled bythe CPU. The CPU executes a NIC operating system in some embodimentsthat controls the ASIC and can run other programs, such as APItranslation logic to enable the compute manager to communicate with abare metal computer. The smart NIC also includes a configurableperipheral control interconnect express (PCIe) interface in order toconnect to the other physical components of the bare metal computersystem (e.g., the x86 CPU, memory, etc.). Via this configurable PCIeinterface, the smart NIC can present itself to the bare metal computersystem as a multitude of devices, including a packet processing NIC, ahard disk (using non-volatile memory express (NVMe) over PCIe), or otherdevices.

Although not necessary for managing a bare metal computer, the NICoperating system of some embodiments is capable of executing avirtualization program (similar to a hypervisor) that enables sharingresources (e.g., memory, CPU resources) of the smart NIC among multiplemachines (e.g., VMs) if those VMs execute on the computer. Thevirtualization program can provide compute virtualization servicesand/or network virtualization services similar to a managed hypervisor.These network virtualization services, in some embodiments, includesegregating data messages into different private (e.g., overlay)networks that are defined over the physical network (shared between theprivate networks), forwarding the data messages for these privatenetworks (e.g., performing switching and/or routing operations), and/orperforming middlebox services for the private networks.

The host-computer hypervisor program and the smart NIC operating system,in some embodiments, are programs that do not have previous versionsinstalled on the computer or the smart NIC. In other embodiments, thehost-computer hypervisor program and the smart NIC operating systemreceived by the host computer are update programs for previouslyinstalled versions of the host-computer hypervisor program and the smartNIC operating system. After a host-computer hypervisor program and thesmart NIC operating system are received, the host computer, in someembodiments, receives an additional program for updating the smart NICoperating system and provides the received program to the smart NIC forthe smart NIC to update the smart NIC operating system.

In some embodiments, after receiving the host-computer hypervisorprogram and the smart NIC operating system, the host computer detects(or determines) that the host computer is connected to the smart NIC. Insome embodiments, the connection is made over a standard PCIe connectionand the smart NIC is detected as a peripheral device that supports theinstallation of the smart NIC operating system. The host computerprovides, based on the detection, the smart NIC operating system to thesmart NIC for the smart NIC to install. In some embodiments, the smartNIC operating system is sent to the smart NIC along with an instructionto the smart NIC to install the smart NIC operating system.

In some embodiments, the host computer includes a local controller thatreceives the host-computer hypervisor program and the smart NICoperating system. The local controller, in some embodiments, providesthe host-computer hypervisor program and the smart NIC operating systemto a compute agent that installs the host-computer hypervisor program onthe host computer to enable the host computer to share resources among aset of compute nodes (e.g., virtual machines, containers, Pods, etc.).The host-computer hypervisor program and the smart NIC operating systemare particular examples of virtualization software that is used, in someembodiments, to enabling resource sharing for the host computer andsmart NIC, respectively.

As mentioned above, the smart NIC in some embodiments includes a set ofASICs, a general purpose CPU, and a memory. The set of ASICs, in someembodiments, includes an ASIC for processing packets forwarded to andfrom the host computer as well as other ASICs for acceleratingoperations performed by the smart NIC on behalf of the host computer(e.g., encryption, decryption, storage, security, etc.). The smart NICoperating system, in some embodiments, includes virtualization programsfor network virtualization, compute virtualization, and storagevirtualization. The virtualization programs, in some embodiments, enablesharing the resources of the smart NIC among multiple tenants of amulti-tenant datacenter.

The network virtualization program provides network virtualizationservices on the smart NIC. The network virtualization services, in someembodiments, include forwarding operations (e.g., network switchingoperations and network routing operations). The forwarding operationsare performed, in some embodiments, on behalf of multiple logicallyseparate networks implemented over a shared network of a datacenter.Forwarding packets for different logical networks, in some embodiments,includes segregating packets for each logically separate network intothe different logically separate networks. Forwarding operations for thedifferent logical networks, in some embodiments, are implemented asdifferent processing pipelines that perform different sets ofoperations. The different sets of operations include, in someembodiments, different logical packet forwarding operations (e.g.,logical switching, logical routing, logical bridging, etc.) anddifferent middlebox services (e.g., a firewall service, a load balancingservice, etc.).

The compute virtualization program, in some embodiments, providesvirtualized compute resources (virtual machines, containers, Pods, etc.)that execute over the compute virtualization program. The storagevirtualization program, in some embodiments, provides storagevirtualization services on the smart NIC. The virtualized storage, insome embodiments, include one or multiple of virtual storage areanetworks (vSANs), virtual volumes (vVOLs), and other virtualized storagesolutions. The virtualized storage appears to the connected hostcomputer as a local storage, in some embodiments, even when the physicalresources that are the backend of the virtualized storage are providedby a distributed set of storages of multiple physical host computers.

FIG. 17 illustrates a system 1700 including a host computer 1710 and aconnected smart NIC 1740 being configured with host computervirtualization software 1730 and a smart NIC operating system 1760. Hostcomputer 1710 includes a set of physical resources 1720 and smart NIC1740 includes a separate set of physical resources 1750. The set ofphysical resources 1720 of the host computer 1710, in some embodiments,includes any or all of a set of general purpose central processing units(CPUs), memory, and storage. The set of physical resources 1750 of thesmart NIC 1740, in some embodiments, include any or all of a set ofgeneral purpose central processing units (CPUs), application-specificintegrated circuits (ASICs), field-programmable gate arrays (FPGAs),memory, and storage. The configuration of the host computer 1710 andsmart NIC 1740 depicted in FIG. 17 will be described in relation to FIG.18.

FIG. 18 conceptually illustrates a process 1800 for installing programsenabling resource sharing on a host computer and smart NIC. The programfor enabling resource sharing on the host computer, in some embodiments,is one of a hypervisor program, a virtual machine monitor, or othervirtualization software. In some embodiments, the program for enablingresource sharing on the host computer executes as an operating system(OS) executing directly on the hardware of the host computer, while inother embodiments, the program executes as a software layer on top of anOS. Similarly, the program for enabling resource sharing on the smartNIC, in some embodiments, executes as an operating system (OS) executingdirectly on the hardware of the smart NIC, while in other embodiments,the program executes as a software layer on top of an OS.

The process 1800, in some embodiments, is performed by a host computer(e.g., host computer 1710) that in some embodiments, is a host computer(e.g., an x86 server) provided by a datacenter provider. The process1800 begins by receiving (at 1810) a host-computer virtualizationprogram (e.g., host-computer hypervisor program 1715) that includes asmart NIC operating system (e.g., smart NIC hypervisor program 1745).The host-computer virtualization program (e.g., host-computer hypervisorprogram 1715) and smart NIC operating system (e.g., smart NIC hypervisorprogram 1745), in some embodiments, are installer programs that installvirtualization software (e.g., a software virtualization layer or avirtualization OS). The host-computer virtualization program, in someembodiments, is received from a network controller computer to configurethe host computer to support virtualized compute nodes, storage, networkcards, etc., to be implemented on the host computer for a virtual orlogical network associated with the network controller computer.

Stage 1701 of FIG. 17 illustrates a host computer 1710 that has not yetbeen configured by virtualization software receiving a host-computerhypervisor program 1715 that includes smart NIC hypervisor program 1745.In some embodiments, the two programs 1715 and 1745 are receivedseparately (e.g., simultaneously as part of a package or sequentially).In other embodiments, the program received (at 1810) is an update to oneor both of the host-computer virtualization program and the smart NICoperating system. The update, in some embodiments, is processed as anupdate to the host-computer virtualization program even in the case thatthe update includes only updates to the smart NIC operating system.

After receiving (at 1810) the host-computer virtualization program andthe smart NIC operating system, the host computer then installs (at1820) the received host-computer virtualization program (e.g.,host-computer hypervisor program 1715) on the host computer. Thevirtualization program, in some embodiments, is a hypervisor such asESXi™ provided by VMware, Inc. or other virtualization programs. Asshown in stage 1702 of FIG. 17, the host computer 1710 installs a hostcomputer hypervisor 1730 (dashed lines in FIG. 17 indicating softwareexecuting on a device) after receiving the host-computer hypervisorprogram 1715. After installing the host-computer virtualization program1715, the host computer 1710 is able to provide virtual resources (e.g.,compute nodes, virtual switches, virtual storage, etc.) based on thephysical resources 1720 of the host computer 1710.

After, or as part of, installing (at 1820) the host-computervirtualization program, the host computer detects (at 1830) that thesmart NIC operating system is included in the host-computervirtualization program. In some embodiments, detecting (at 1830) thatthe smart NIC operating system is incorporated in the host-computervirtualization program includes a set of operations to perform toprogram any virtualization-capable smart NICs connected to the hostcomputer. The set of operations, in some embodiments, includes anoperation to detect whether a virtualization-capable smart NIC isconnected to the host computer.

The host computer determines (at 1840) that a virtualization-capablesmart NIC is connected to the host computer. In some embodiments,determining (at 1840) that a virtualization-capable smart NIC isconnected to the host computer is part of the installation process forthe host-computer virtualization program. Determining (at 1840) that avirtualization-capable smart NIC is connected to the host computer, insome embodiments, is based on a set of components exposed to the hostcomputer by the smart NIC. In some embodiments, the host-computervirtualization program (e.g., an ESXi™ installer) queries a baseboardmanagement controller (BMC) of the host computer to determine (at 1840)that the smart NIC is compatible with the smart NIC operating system(e.g., a smart NIC operating system (OS) such as ESXio™). In someembodiments, a virtualization-capable smart NIC is identified to theconnected host computer during a previously performed process thatconfigures the connection between the host-computer virtualizationprogram computer and the smart NIC.

After determining (at 1840) that a virtualization-capable smart NIC isconnected to the host computer, the host computer provides (at 1850) thesmart NIC operating system to the smart NIC for the smart NIC to installa virtualization layer to enable the smart NIC to share resources on thesmart NIC. FIG. 17 illustrates, in stage 1702, that the host computer1710 sends the smart NIC hypervisor program 1745 to smart NIC 1740 forthe smart NIC 1740 to install a smart NIC operating system. In stage1703 of FIG. 17, the smart NIC 1740 installs smart NIC hypervisor 1760to enable virtualization of the physical resources 1750 of the smart NIC1740.

In some embodiments, providing (at 1850) the smart NIC operating systemfor the smart NIC to install the smart NIC operating system includesmultiple sub-operations. FIG. 19 conceptually illustrates a process 1900that, in some embodiments, represents the sub-operations that areincluded in operation 1850. FIG. 19 is described, at least in part, inrelation to FIGS. 20-22. Process 1900 begins by configuring (at 1910)the smart NIC to boot from an image stored on the host computer. In someembodiments, the host-computer virtualization program invokes BMC APIsto configure (at 1910) the smart NIC to enable a unified extensiblefirmware interface (UEFI) SecureBoot on the smart NIC.

After configuring the smart NIC to enable booting from an image storedon the host computer, the smart NIC operating system is staged (at 1920)on the host computer for the smart NIC to use in an initial boot-upprocess. The host-computer virtualization program, in some embodiments,invokes BMC APIs to stage (at 1920) the smart NIC operating system(e.g., ESX.io) in BMC storage as an image file (e.g., as an ISO, DD,tgz, or zip file) for the smart NIC to perform the initial boot-up ofthe smart NIC operating system.

FIG. 20 illustrates a simplified view of host computer 2010, includingBMC 2018 connecting to smart NIC 2040 through PCIe bus 2042. FIG. 20illustrates that the BMC 2018 provides a smart NIC operating system 2045stored in BMC memory 2019 to the CPU 2044 of the smart NIC 2040 for theCPU 2044 to perform a first boot from the image stored in BMC storage2019 (operation “1”). FIG. 20 illustrates components of the smartNIC,which will be described below by reference to other figures.

At 1930, the process 1900 provides the smart NIC virtualization programfor storage on partitioned memory. FIG. 21 conceptually illustrates aprocess 2100 that is performed by the smart NIC to install the smart NICoperating system as part of process 1900 of FIG. 19. In someembodiments, the smart NIC, at this point, performs (at 2110) a bootsequence for the smart NIC operating system from the BMC storage of thehost computer. During the initialization, the local storage (e.g.,embedded multi-media controller (eMMC), or other memory) is detected andpartitioned (at 2120). In some embodiments, the local storage isdetected based on initialization scripts of the smart NIC operatingsystem. The detected storage is then partitioned.

The smart NIC operating system (e.g., ESX.io bootloader and systemmodules) is then stored (at 2130) in the local partitioned storage. Insome embodiments, the smart NIC operating system is copied to the smartNIC local storage for storing (at 2130) from the host computer based ona process of the smart NIC operating system. In other embodiments, thehost-computer virtualization program detects that the smart NIC hasbooted from the image and partitioned the storage and provides the smartNIC operating system to the smart NIC for storage (at 2130). FIG. 22illustrates a smart NIC 2240 after the installation is complete with itsstorage 2246 partitioned into a first partition 2246 a storing the smartNIC operating system (NIC OS) 2245 and a second partition 2246 b.

FIG. 20 illustrates that after providing the smart NIC operating system2045 to the CPU 2044 (in operation “1”), the smart NIC operating systemis provided to the memory 2046 (as operation “2”). In this example, thesmart NIC OS 2045 is provided through the external PCIe bus 2042 and aninternal PCIe bus 2043. The internal PCIe bus 2043 connects the I/O ASICside of the NIC (including ASIC 2047 and physical ports 2041) with theCPU side of the NIC (including CPU 2044 and memory 2046). While FIG. 20and several other figures described below use an internal PCIe bus 2043,other embodiments do not use an internal PCIe bus. For instance, someembodiments implement the NIC as a system-on-chip and have the differentcircuits of the NIC communicate through other interconnects and buses.

The smart NIC operating system then verifies (at 2140) that theinstallation was successful. In some embodiments, verifying (at 2140)that the installation was successful includes verifying that the smartNIC device and functions are successfully enumerated. The verification(at 2140), in some embodiments, is based on a set of post-installationscripts. In some embodiments, the verification includes a communicationto the host-computer virtualization program installation process thatthe installation on the smart NIC was successful.

The host computer BMC then configures (at 1940) the smart NIC to bootfrom the local copy of the smart NIC operating system. FIG. 20illustrates that after the smart NIC operating system is stored in thememory 2046, the CPU accesses the smart NIC operating system (SN HV2045) from the memory 2046 (as operation “3”).

The host computer then completes (at 1950) the installation of thehost-computer virtualization program and reboots the host computer andthe smart NIC. These operations (1940 and 1950) are reflected in process2100 of FIG. 21 in operation 2150, in which the smart NIC is configuredto boot from the locally-stored smart NIC operating system and isrebooted. In some embodiments, the host computer is rebooted first tocomplete the installation of the host-computer virtualization program,and the host computer and smart NIC are then rebooted again to completethe installation of the smart NIC operating system. The host computer,in some embodiments, is rebooted first and then initiates a reboot ofthe smart NIC (from the smart NIC operating system stored in the memoryof the smart NIC). In embodiments in which the smart NIC supportscompute nodes of multiple tenants, an attempt to install the smart NICoperating system by another tenant's host-computer virtualizationprogram installer is blocked. In some embodiments, the installation by asecond tenant is unnecessary and would destroy any virtualizationalready performed for the first tenant. In such embodiments, the smartNIC or the host-computer virtualization program installer is programmedin such a way to determine if the smart NIC operating system is alreadyinstalled. Additionally, the smart NIC or the host-computer hypervisorprogram installer, in some embodiments, is programmed to identify thetenant that installed the smart NIC operating system to allow updates tothe smart NIC operating system made by that tenant.

As illustrated in FIG. 17, the host-computer hypervisor program 1715 andthe smart NIC hypervisor 1745, in some embodiments, are programs that donot have previous versions installed on the host computer 1710 or thesmart NIC 1740. In other embodiments, the host-computer hypervisorprogram 1715 and the smart NIC hypervisor 1745 received by the hostcomputer 1710 are update programs for previously-installed versions ofthe host-computer hypervisor program 1715 and the smart NIC hypervisor1745. In yet other embodiments, after a host-computer hypervisor program1715 and the smart NIC hypervisor 1745 are received, the host computer1710, in some embodiments, receives an additional program for updatingthe smart NIC hypervisor 1745 and provides the received program to thesmart NIC 1740 for the smart NIC 1740 to update the smart NIC hypervisor1745.

FIG. 22 illustrates a smart NIC 2240 after a set of configurationprocesses similar or identical to those described above in relation toFIGS. 18, 19, and 21. After installing the smart NIC operating system2245, the CPUs 2044 execute a NIC operating system 2260 (e.g., ahypervisor, virtualization OS, or virtual machine monitor, etc.) thatincludes, in some embodiments, a compute virtualization module 2261, anetwork virtualization module 2262, and a storage virtualization module2263. In some embodiments, a smart NIC operating system 2260 supportsonly a subset of these functions, supports additional functions, orsupports a different combination of functions. The networkvirtualization module (or capability) 2262, in some embodiments, is usedto present the smart NIC 2240 as multiple physical functions (PF)connected to a single host computer (e.g., a server) or a set of hostcomputers. Each PF, in some embodiments, can be further virtualized asmultiple virtual functions (VFs).

As used in this document, physical functions (PFs) and virtual functions(VFs) refer to ports exposed by a smart NIC using a PCIe interface toconnect to a host computer (or set of host computers) over a PCIe bus. APF refers to an interface of the smart NIC that is recognized as aunique resource with a separately configurable PCIe interface (e.g.,separate from other PFs on a same smart NIC). The VF refers to avirtualized interface that is not fully-configurable as a separate PCIeresource, but instead inherits some configuration from the PF with whichit is associated while presenting a simplified configuration space. VFsare provided, in some embodiments, to provide a passthrough mechanismthat allows compute nodes executing on a host computer to receive datamessages from the smart NIC without traversing a virtual switch of thehost computer. The VFs, in some embodiments, are provided byvirtualization software executing on the smart NIC. The VFs and PFs, insome embodiments, are deployed to support the storage and computervirtualization modules 2263 and 2261. For example, a PF or VF can bedeployed to present a storage or compute resource provided by the smartNIC as a local device (i.e., a device connected to the host computer bya PCIe bus).

The smart NIC 2240 also includes a local memory 2246 and a set ofgeneral purpose CPUs 2044 that are used to install (and support) thevirtualization layer 2330, which enables resource sharing of elements onthe I/O portion and a compute portion of the smart NIC (e.g., the CPUs2044, memory 2246, etc.). As shown, smart NIC operating system 2245 isstored in memory 2246 (and more specifically, in memory partition 2246a) which communicates with the CPUs 2044 to execute the smart NICoperating system 2245 to install the NIC operating system 2260 (e.g.,ESX.io). In some embodiments, the memory 2246 is an embedded multi-mediacontroller (eMMC) memory that includes flash memory and a flash memorycontroller. The memory 2246 and the CPUs 2044 communicate, in someembodiments, with other elements of the smart NIC 2240 over an internalPCIe bus 2043.

Smart NIC 2240 also includes an I/O ASIC 2047 (among a set of additionalASICs or field-programmable gate arrays (FPGAs) not shown) that can beused to accelerate data message forwarding or other networking functions(encryption, security operations, storage operations, etc.). A set ofphysical ports 2041 that provide connections to a physical network andinteracts with the I/O ASIC 2047 is also included in smart NIC 2240. TheI/O ASIC and physical ports that are depicted in FIG. 20 perform asimilar operations.

The host computer and smart NIC, in some embodiments, are elements of adatacenter that implements virtual networks for multiple tenants. Insome embodiments, the virtual networks implemented in the datacenterinclude one or more logical networks including one or more logicalforwarding elements, such as logical switches, routers, gateways, etc.In some embodiments, a logical forwarding element (LFE) is defined byconfiguring several physical forwarding elements (PFEs), some or all ofwhich execute on host computers or smart NICs along with deployedcompute nodes (e.g., VMs, Pods, containers, etc.). The PFEs, in someembodiments, are configured to implement two or more LFEs to connect twoor more different subsets of deployed compute nodes. The virtual networkin some embodiments, is a software-defined network (SDN) such as thatdeployed by NSX-T™ and includes a set of SDN managers and SDNcontrollers. In some embodiments, the set of SDN managers manage thenetwork elements and instruct the set of SDN controllers to configurethe network elements to implement a desired forwarding behavior for theSDN.

FIG. 23 illustrates a system 2300 that includes the host computer 2310,the smart NIC 2340, a set of SDN controller computers 2370, and a set ofSDN manager computers 2380. The set of SDN manager computers 2380implement a management plane for a particular SDN (e.g., a cloudprovider SDN, or a tenant SND executed in the cloud or in a privatedatacenter). The set of SDN manager computers 2380 receive input from auser to implement a certain SDN configuration including, in someembodiments, configuration for a set of LFEs, a set of compute nodes,and a set of storage resources. The set of SDN manager computers 2380communicate the desired configurations to the set of SDN controllercomputers 2370 implementing a control plane for the SDN. The set of SDNcontrollers 2370 generate configuration data for a set of host computers(including host computer 2310) and provide control messages to a localcontroller 2390 on the host computer 2310 to configure a set of thenetwork elements specified by a user. In some embodiments, the SDNmanager computers 2380 and SDN controller computers 2370 are the NSX-T™managers and controllers licensed by VMware, Inc.

As shown, the set of SDN controller computers 2370 send a host-computerhypervisor program 2315 to a local controller 2390 of host computer 2310through smart NIC 2340 (using physical port (PP) 2341 and a PCIe bus2342). In some embodiments, the host-computer hypervisor program 2315 isan installer program executed by the compute resources 2321 of hostcomputer 2310 to install a virtualization layer 2330 (e.g., a hypervisorsuch as ESXi™ provided by VMware, Inc.) to enable the physical resources2320 of host computer 2310 (including compute, network and storageresources 2321, 2322, and 2323) to be shared among multiple virtualizedmachines.

Local controller 2390 receives the host-computer hypervisor program 2315and provides it to the physical resources 2320 (e.g., runs thehost-computer hypervisor program 2315 using the compute resources 2321of the host computer 2310). Based on the host-computer hypervisorprogram 2315, a virtualization layer 2330 is installed on the hostcomputer 2310 (shown using dashed lines to distinguish between hardwareand software of the host computer 2310). While virtualization layer 2330is shown as including a compute virtualization module 2261, a networkvirtualization module 2262, and a storage virtualization module 2263, insome embodiments, a virtualization layer 2330 supports only a subset ofthese functions, supports additional functions, or supports a differentcombination of functions. As described above in relation to FIG. 18, aspart of executing host-computer hypervisor program 2315 to install thevirtualization layer 2330, the host computer 2310 will provide smart NICoperating system 2345 to smart NIC 2340 for the smart NIC 2340 toexecute (e.g., install a smart NIC virtualization layer).

FIG. 24 illustrates a host computer 2410 executing a host computerhypervisor 2430 and a set of compute nodes (CN₁-CN_(M)) 2411 for a firsttenant (“T1”) and a set of compute nodes (CN_(a)-CN_(x)) 2412 for asecond tenant (“T2”). FIG. 24 also illustrates a logical view of theseparate logical networks defined for T1 and T2. As shown, the logicalnetworks for the separate tenants include a set of logical routers andlogical switches (2431 and 2432 for T1 and 2433 and 2434 for T2,respectively) that connect the compute nodes of the tenant. Thedifferent logical networks are both implemented by the host computerhypervisor 2430 and, in some embodiments, the smart NIC 2440. The hostcomputer hypervisor 2430, in some embodiments, includes a virtual switch(e.g., a software switch) that implements the LRs and LSs for thedifferent tenants on the host computer 2410. In some embodiments, theI/O ASIC 2047 of the smart NIC 2440 is configured by the host computerhypervisor 2430 to perform logical routing and logical switchingoperations for the separate tenants. In other embodiments, the I/O ASIC2047 of the smart NIC 2440 is configured by a hypervisor (not shown) ofthe smart NIC 2440.

The I/O ASIC 2047 of the smart NIC 2440 and the host computer hypervisor2430, in some embodiments, implement separate processing pipelines forthe separate tenants (e.g., the separate logical networks). Datamessages, e.g., ingressing data messages T1 and T2, are segregated intothe different processing pipelines of the different logical networks ofthe different tenants, in some embodiments, based on logical networkidentifiers (e.g., virtual local area network (VLAN) or virtualextensible LAN (VXLAN) identifiers).

FIG. 25 illustrates a smart NIC 2540 providing compute virtualization2561 and network virtualization 2562 to provide virtualized resources(e.g., compute nodes 2513, physical functions 2570 a-n, and a set ofvirtual functions 2571) to be used by compute nodes 2511 executing on ahost computer 2510 (depicted as executing within a host computerhypervisor 2530). In some embodiments, compute nodes 2513 are edgegateway machines that provide gateway services for compute nodes 2511executing on host computer 2510.

Network virtualization 2562 provides a virtualized PCIe interface thatpresents the PCIe bus 2542 as including a set of physical functions (PFs2570 a-n) as defined above and, for a set of physical functions, a setof virtual functions 2571. Both the host computer hypervisor 2530 andNIC OS 2560 execute a virtual switch 2532 that provides logical routingand logical switching operations for compute nodes (virtual machines,container, Pods, etc.). In some embodiments, a virtual switch 2573 onthe smart NIC 2540 provides logical forwarding operations for computenodes on both the smart NIC 2540 and on the host computer 2510. In someembodiments, the virtual switch 2573 interacts with the I/O ASIC 2047 toperform data message processing offload (e.g., flow processing offload)on behalf of the host computer 2510.

FIG. 26 illustrates an interaction between an I/O ASIC 2047, a virtualswitch 2673, and a fast path entry generator 2675, in some embodiments.In some embodiments, I/O ASIC 2047 is configured to perform fast pathprocessing for data messages to and from compute nodes executing on hostcomputers connected to smart NIC 2640. In some embodiments, for firstdata messages in a data message flow (e.g., data message 2680), the I/OASIC 2047 is programmed to provide data message 2680 to a virtual switch2673 executing in the NIC OS 2660. The virtual switch 2673 processes thedata message 2680 through a processing pipeline 2674. Processingpipeline 2674 includes operations 2674 a-n. The operations 2674 a-n, insome embodiments, include a set of logical forwarding operations(logical switching, bridging, routing, etc.). In some embodiments, theoperations 2674 a-n also include a set of middlebox services (e.g.,firewall, load balancing, deep packet inspection, etc.) enabled for aparticular logical network (e.g., belonging to a particular logicaltenant associated with the data message 2680). The processing pipeline2674, in some embodiments, identifies a particular set of data messageattributes 2681 used to identify a data message flow or set of datamessage flows to which the data message 2680 belongs and determines aparticular set of actions (e.g., slow path result 2682) to take forfuture data messages matching the identified data message attributes2681. The data message attributes 2681 and slow path results 2682 arethen provided to fast path entry generator 2675 to be combined into afast path entry 2691 to be programmed into I/O ASIC 2047 to processfuture data messages having attributes that match the identified datamessage attributes 2681.

FIG. 27 illustrates a system 2700 including a smart NIC 2740 and a setof host computers 2710A-J connected to the smart NIC 2740 through twodifferent PCIe buses 2742A and 2742J. The smart NIC 2740 has a NIC OS2760. Each PCIe bus 2742 is used to present virtualized elements of thesmart NIC 2740 as different devices connected to the host computers2710A-J. For example, PF 2770 a and 2770 n present as a NIC, while PF2770 b and 2770(n−1) present as connected storage devices. PF 2770 a, asshown, presents a set of VFs 2771 as passthrough ports bypassing avirtual switch (not shown) of host computer 2710A. PF 2770 b and2770(n−1) appear to the host computers 2710A and 2710J, the hypervisors2730 a-j executing on these computers, and the compute nodes 2711 a-bthat execute on the hypervisors 2730 as (emulated) local storages 2765 aand 2765 b connected by a PCIe bus 2742. The emulated local storages2765 may appear as a local storage, a virtual storage area network, orvirtual volume. In some embodiments, the storage virtualization 2263backs the emulated local storage 2765 with a virtualized storage usingnon-volatile memory express (NMVe) or NVMe over fabrics (NVMe-OF) 2766.The virtualized storage, in some embodiments, communicates with anexternal storage that is located on multiple physical storage devices2780 a-p. The communication, in some embodiments, uses NVMe-oF based onremote direct memory access (RDMA) or transport control protocol (TCP).

FIG. 28 conceptually illustrates an electronic system 2800 with whichsome embodiments of the invention are implemented. The electronic system2800 can be used to execute any of the control, virtualization, oroperating system applications described above. The electronic system2800 may be a computer (e.g., a desktop computer, personal computer,tablet computer, server computer, mainframe, a blade computer etc.),phone, PDA, or any other sort of electronic device. Such an electronicsystem includes various types of computer readable media and interfacesfor various other types of computer readable media. Electronic system2800 includes a bus 2805, processing unit(s) 2810, a system memory 2825,a read-only memory 2830, a permanent storage device 2835, input devices2840, and output devices 2845.

The bus 2805 collectively represents all system, peripheral, and chipsetbuses that communicatively connect the numerous internal devices of theelectronic system 2800. For instance, the bus 2805 communicativelyconnects the processing unit(s) 2810 with the read-only memory 2830, thesystem memory 2825, and the permanent storage device 2835.

From these various memory units, the processing unit(s) 2810 retrieveinstructions to execute and data to process in order to execute theprocesses of the invention. The processing unit(s) may be a singleprocessor or a multi-core processor in different embodiments.

The read-only-memory (ROM) 2830 stores static data and instructions thatare needed by the processing unit(s) 2810 and other modules of theelectronic system. The permanent storage device 2835, on the other hand,is a read-and-write memory device. This device is a non-volatile memoryunit that stores instructions and data even when the electronic system2800 is off. Some embodiments of the invention use a mass-storage device(such as a magnetic or optical disk and its corresponding disk drive) asthe permanent storage device 2835.

Other embodiments use a removable storage device (such as a floppy disk,flash drive, etc.) as the permanent storage device. Like the permanentstorage device 2835, the system memory 2825 is a read-and-write memorydevice. However, unlike storage device 2835, the system memory is avolatile read-and-write memory, such a random access memory. The systemmemory 2825 stores some of the instructions and data that the processorneeds at runtime. In some embodiments, the invention's processes arestored in the system memory 2825, the permanent storage device 2835,and/or the read-only memory 2830. From these various memory units, theprocessing unit(s) 2810 retrieve instructions to execute and data toprocess in order to execute the processes of some embodiments.

The bus 2805 also connects to the input and output devices 2840 and2845. The input devices 2840 enable the user to communicate informationand select commands to the electronic system. The input devices 2840include alphanumeric keyboards and pointing devices (also called “cursorcontrol devices”). The output devices 2845 display images generated bythe electronic system 2800. The output devices 2845 include printers anddisplay devices, such as cathode ray tubes (CRT) or liquid crystaldisplays (LCD). Some embodiments include devices such as a touchscreenthat function as both input and output devices.

Finally, as shown in FIG. 28, bus 2805 also couples electronic system2800 to a network 2865 through a network adapter (not shown). In thismanner, the computer can be a part of a network of computers (such as alocal area network (“LAN”), a wide area network (“WAN”), or an Intranet,or a network of networks, such as the Internet. Any or all components ofelectronic system 2800 may be used in conjunction with the invention.

Some embodiments include electronic components, such as microprocessors,storage and memory that store computer program instructions in amachine-readable or computer-readable medium (alternatively referred toas computer-readable storage media, machine-readable media, ormachine-readable storage media). Some examples of such computer-readablemedia include RAM, ROM, read-only compact discs (CD-ROM), recordablecompact discs (CD-R), rewritable compact discs (CD-RW), read-onlydigital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a varietyof recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.),flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.),magnetic and/or solid state hard drives, read-only and recordableBlu-Ray® discs, ultra-density optical discs, any other optical ormagnetic media, and floppy disks. The computer-readable media may storea computer program that is executable by at least one processing unitand includes sets of instructions for performing various operations.Examples of computer programs or computer code include machine code,such as is produced by a compiler, and files including higher-level codethat are executed by a computer, an electronic component, or amicroprocessor using an interpreter.

While the above discussion primarily refers to microprocessor ormulti-core processors that execute software, some embodiments areperformed by one or more integrated circuits, such asapplication-specific integrated circuits (ASICs) or field-programmablegate arrays (FPGAs). In some embodiments, such integrated circuitsexecute instructions that are stored on the circuit itself.

As used in this specification, the terms “computer”, “server”,“processor”, and “memory” all refer to electronic or other technologicaldevices. These terms exclude people or groups of people. For thepurposes of the specification, the terms display or displaying meansdisplaying on an electronic device. As used in this specification, theterms “computer readable medium,” “computer readable media,” and“machine readable medium” are entirely restricted to tangible, physicalobjects that store information in a form that is readable by a computer.These terms exclude any wireless signals, wired download signals, andany other ephemeral signals.

This specification refers throughout to computational and networkenvironments that include virtual machines (VMs). However, virtualmachines are merely one example of data compute nodes (DCNs) or datacompute end nodes, also referred to as addressable nodes. DCNs mayinclude non-virtualized physical hosts, virtual machines, containersthat run on top of a host operating system without the need for ahypervisor or separate operating system, and hypervisor kernel networkinterface modules.

VMs, in some embodiments, operate with their own guest operating systemson a host using resources of the host virtualized by virtualizationsoftware (e.g., a hypervisor, virtual machine monitor, etc.). The tenant(i.e., the owner of the VM) can choose which applications to operate ontop of the guest operating system. Some containers, on the other hand,are constructs that run on top of a host operating system without theneed for a hypervisor or separate guest operating system. In someembodiments, the host operating system uses name spaces to isolate thecontainers from each other and therefore provides operating-system levelsegregation of the different groups of applications that operate withindifferent containers. This segregation is akin to the VM segregationthat is offered in hypervisor-virtualized environments that virtualizesystem hardware, and thus can be viewed as a form of virtualization thatisolates different groups of applications that operate in differentcontainers. Such containers are more lightweight than VMs.

Hypervisor kernel network interface modules, in some embodiments, arenon-VM DCNs that include a network stack with a hypervisor kernelnetwork interface and receive/transmit threads. One example of ahypervisor kernel network interface module is the vmknic module that ispart of the ESXi™ hypervisor of VMware, Inc.

It should be understood that while the specification refers to VMs, theexamples given could be any type of DCNs, including physical hosts, VMs,non-VM containers, and hypervisor kernel network interface modules. Infact, the example networks could include combinations of different typesof DCNs in some embodiments.

While the invention has been described with reference to numerousspecific details, one of ordinary skill in the art will recognize thatthe invention can be embodied in other specific forms without departingfrom the spirit of the invention. For instance, several examples wereprovided above by reference to specific distribute storage processes,such as vSAN. One of ordinary skill will realize that other embodimentsuse other distributed storage services (e.g., vVol offered by VMware,Inc.). The vSAN and vVol services of some embodiments are furtherdescribed in U.S. Pat. Nos. 8,775,773 and 9,665,235, which are herebyincorporated by reference. Thus, one of ordinary skill in the art wouldunderstand that the invention is not to be limited by the foregoingillustrative details, but rather is to be defined by the appendedclaims.

We claim:
 1. A method of emulating a local storage for a host computercomprising a network interface card (NIC), the method comprising on theNIC: deploying a storage emulator first program to emulate a localstorage, from a plurality of external storages accessed through the NIC,for a set of processes executing on the host computer; and deploying astorage service second program to perform a set of distributed storageservices that account for the set of processes on the host computer lackof knowledge regarding plurality of external storages being used toemulate the local storage.
 2. The method of claim 1, wherein the set ofdistributed storage services comprises data efficiency processesincluding at least one of deduplication operations and compressionoperations.
 3. The method of claim 1, wherein the set of distributedstorage services comprises security processes including at least one ofend-to-end encryption and access control operations.
 4. The method ofclaim 1, wherein the set of distributed storage services comprises dataand life cycle management, including at least one of storage vmotion,snapshot operations, snapshot schedules, cloning, disaster recovery,backup, and long term storage.
 5. The method of claim 1, wherein the setof distributed storage services improve performance, resiliency andsecurity of the set of host processes access the plurality of externalstorages through the NIC.
 6. The method of claim 1, wherein the set ofdistributed storage services comprise encryption services encrypt theread and write requests and responses that are exchanged between the setof host processes and the plurality of external storages that are madeto appear as the local storage.
 7. The method of claim 6, wherein thestorage service second program is a proxy program that uses a storageservice program executing on another device to provide one or more ofthe distributed storage services in the set of distributed storageservices.
 8. The method of claim 6 further comprising deploying anetwork fabric driver on the NIC to access the plurality of externalstorages through one or more intervening networks.
 9. The method ofclaim 8, wherein the network fabric driver is a non-volatile memoryexpress over fabric (NVMeOF) driver.
 10. The method of claim 9, whereinthe storage service second program communicates through the NVMeOFdriver with other storage service programs that provide the distributedstorage service for a plurality of compute and storage nodes.
 11. Themethod of claim 1, wherein the set of processes comprise an operatingsystem executing on the host computer.
 12. The method of claim 1,wherein the set of processes comprise a set of machines executing on thehost computer.
 13. The method of claim 1, wherein the set of processescomprise a hypervisor executing on the host computer.
 14. Anon-transitory machine readable medium storing sets of instructions foremulating a local storage for a host computer comprising a networkinterface card (NIC), the sets of instructions for execution by at leastone processing unit of the NIC, the sets of instructions for: emulatinga local storage, from a plurality of external storages accessed throughthe NIC, for a set of processes executing on the host computer; andperforming a set of distributed storage services that account for theset of processes on the host computer lack of knowledge regardingplurality of external storages being used to emulate the local storage.15. The non-transitory machine readable medium of claim 14, wherein theset of distributed storage services comprises data efficiency processesincluding at least one of deduplication operations and compressionoperations.
 16. The non-transitory machine readable medium of claim 14,wherein the set of distributed storage services comprises securityprocesses including at least one of end-to-end encryption and accesscontrol operations.
 17. The non-transitory machine readable medium ofclaim 14, wherein the set of distributed storage services comprises dataand life cycle management, including at least one of storage vmotion,snapshot operations, snapshot schedules, cloning, disaster recovery,backup, and long term storage.
 18. The non-transitory machine readablemedium of claim 14, wherein the set of distributed storage servicesimprove performance, resiliency and security of the set of hostprocesses access the plurality of external storages through the NIC. 19.The non-transitory machine readable medium of claim 14, wherein the setof distributed storage services comprise encryption services encrypt theread and write requests and responses that are exchanged between the setof host processes and the plurality of external storages that are madeto appear as the local storage.
 20. The non-transitory machine readablemedium of claim 14, wherein the set of instructions further for using anetwork fabric driver on the NIC to access the plurality of externalstorages through one or more intervening networks.